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Abstract 

We introduce the notion of multiscale covariance tensor fields (CTF) asso¬ 
ciated with Euclidean random variables as a gateway to the shape of their 
distributions. Multiscale CTFs quantify variation of the data about every 
point in the data huidscape at all spatial scales, unlike the usual covari¬ 
ance tensor that only ciumitifies global variation about the inemi. Empirical 
forms of lucidized covarimice previously have been used in data miiilysis mid 
visualization, for example, in local principal component analysis, but we de¬ 
velop a framework for the systematic trc^atment of theoretic;al questions and 
inatlieniatical analysis of computational models. We prove strong stability 
theorems with respect to the Wasserstein distance between probability mea¬ 
sures, obtmn consistency results for estimators, as well as bounds on the 
rate of convergence of empirical CTFs. These results show that CTFs are 
robust to sampling, noise and outliers. We provide numerous illustrations of 
how CTFs let us extract shape from data and also apply CTFs to manifold 
clustering, the problem of categorizing data points according to their noisy 
membership in a collection of possibly intersecting smooth submanifolds of 
Euclidemi space. We prove that the proposed manifold clustering method is 
stable mid carry out several experiments to illustrate the method. 

Keywords: shape of data, multiscale data analysis, covariance fields, 
FVecliet functions, manifold clustering 


1. Introduction 

Probing, analyzing and visualizing the shape of complex data arc chal¬ 
lenges that are magnified by the intricate dependence of their structural 
properties, as basic as dimensionality, on location and scale (cf. [T]). As such, 
resolving and integrating the geometry and topology of data across scales are 
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problems of foremost importance. In tliis paper, we cleveloj) notion of 
inultiscalc covariance tensor fields (CTF) associated with Euclidean random 
variable's and show that many properties of the shape of their distributions 
become accessible through CTFs, which provide stable representations that 
can be estimated reliably from data. 

For a random vector y € senile dependence is controlled by a kernel 
fmiction K{x,y^a) ^0. where x^y € and a > 0 is the scale parameter. 
The idea is that from the standpoint of x, at scale a > 0, the kernel masks 
the chstribution by attributing weight K{x^ y^a) to data located at y, cre¬ 
ating a windowing effect. More simply put, K{x^y.a) quantifies how well 
an observer at x sees data at y at scide a. Covmiation of tlie weighted data 
is measured relative to every point i G R^^, not just about the memi as is 
coniinun practice, thus giving rise to a mnltiscale covariance field. Special 
cases of these covariance fields were introduced in [2], targeting applic:ations 
to such problems as detection of local scales and feature ricli points in shapes. 
Here we present a more systematic; treatment that includes a broader formu¬ 
lation of mnltiscale CTFs, stability the'oreiiLS that ensure that properties of 
probability mciisures derived from mnltiscale CTFs are robust, as well as 
Lonsistciicy results mid convergence rates for empiricid CTFs. We prove sta¬ 
bility of CTFs with respect to tlie Wasserstein distance between probability 
measures, a metric that is finding uses in an ever expanding landsc;ape of 
problems and whose origins are in optimal transport theory [SI 1^. Since 
Wmsserstein distance metrizes weak convergence of probability measures, we 
obtain a strong stability residt that ensures that if two probability distribu¬ 
tions are similar in a weak sense, then their muJtiscale CTFs are uniformly 
dose over the entire domain. Convergeiic;e rates are derived from the sta¬ 
bility tlieorems and results by Fournier and Guillin [5] and Garcia-Trillos 
and Slcpcev [S| on convergence of empirical measures. The standard covari¬ 
ance tensor of a random vector y G quantifies covmiation of y about the 
mean, but may be extended to a full covariance field by considering covaria¬ 
tion about mbitrary points. Nonetheless, this field provides no information 
about tlie orgmiization of the data other than tliat already contained in the 
covariance about the mean. Thus, a localized formulation is essential fur 
gaining additional insight into the shape of data. 

The trace of a multisc;a]c CTF is a scalar field that gives a multisc;a]c 
analogue of the classical FVediet function V{x) =E[||y — 2 :||^] of a nuidom 
variable y with finite second moment. The Frechet function provides a more 
geometric interpretation of the mean as the unique mininiizer of V] that is. 
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the point /i € with rcHpect to which tlie s])read of y is minimal. Similarly, 
the local extrema and other ])ro])erties of the mnltiscale Frecliet ftmctioii 
provide a wealth of information about the distribution of y. In fact, we show 
that the distribution of any random vector may be fully recovered from the 
inultLsciile Frechet function associated with the Gaussian k<irnel. 

Several vmiants of empiricid localized or weighted covariance prtwiously 
have been iLsed in data analysis, but we develop a framework for the? for¬ 
mulation and systematic treatment of such problems. Alhird et al. have 
developed a computational model termed geometric; multi-resolution analy¬ 
sis for multiscale data analysis based on covariance loc;a]ized to hierarchies 
of dyadic cubes [7|. In computer graphics, locid principal component anal¬ 
ysis {PC A) is commonly used in the estimation of nor nulls to surfaces from 
point-cloud data [S] for surface reconstruction: see ^ilso |2] and references 
therein. In computer vision, tensor voting by Medioni et al. (TUI has becMi 
applied to multiple image analysis and processing tasks. Brox et al. have 
used empirical covariance weighted by the isotropic Gaussian kernel in non¬ 
par aine trie density estimation targeting applications in motion tracking [TT] . 
In the literature dealing with clustering, especially clustering of multiple pos¬ 
sibly intersecting manifolds, local PC A ideas have been iLsed in the works of 
Kushnir et al. [T^], Goldberg et al. [T3], Gong et al. |TJ], Wang et al. [T5), 
and in a series of papers by Arias-Castro and collaborators imiTS] . 



Figure 1: Examples of data clustered along intersecting manifolds. 

The special case of iiffine linear subspaces, known as subspace clustering, 
has been arldressed in the machine learning and c;omputer vision literature by 
many authors using a variety of techniques (cf. PHI EH 1211122112211211 [2S [2E]) • 
Wore general manifold clustering has been considered in [2Il [2H1 [22111211221 
[T51 [in US [THl . In our approach, we exploit the fact that locidized covari¬ 
ance tensors encode rich information about the tangential] structure of the 
submanifolds that underlie the data. Combined with information about the 
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(relative) position* of the data points, they yield an (effective data rej^reseiita- 
tioii fur manifold chistering. Altliougli several diffcTeiit clustering techniques 
c;ould be applied to the ‘*tensorized" data, we use the single linkages hierarclii- 
cal method becamse it produces provenly stable dendrograms. In conjunction 
with the stability and consistency results for covmiance fields, this ensures 
that the manifold clustering method is stable at all steps. Dendrogram sta¬ 
bility is analyzed in the framework of [52] . 

Contributions and Organization of the paper. The paper includes serveral il¬ 
lustrations and applications of CTFs to data analysis. For example, to il¬ 
lustrate how geometric information can be extracted from CTFs, we show 
that the curvature of phme curves and the ])rinci])id curvatures of surfaces 
in R'^ can be CiUculated from the spectrum of nuiltisciile CTFs. Thus, multi- 
scale covariance tensors give a way of extending the we infinitesimal measures 
of geometric complexity to all scales and general probability distributions, 
not jiLst those supported on smooth submanifolds. We also apply multisc;a]e 
CTFs to manifold clustering, the problem of clustering Euclidean data that 
are orgmiized along a finite union of possibly intersecting smooth submani- 
fukLs. Fig. shows three such examples. 

The main goals of the papcT are: (i) to establish th(^ foundations for analy¬ 
sis, visualization and management of data with methods based on multisc;a]e 
covariance tensor fields, and (ii) to describe applications that charac'terize 
the usefulness of CTFs in data micdysis. In Section we formulate the no¬ 
tion of nuiltiscale CTFs for a broiid class of kernels and give examples that 
illustrate how CTFs reveal the geometry of data. In Section we show that 
the curvature of a plane c:urvG and the i)rinci])a] curvatures of a surface in 
can be recovered from small-scale covariance. Section [I] is devoted to the 
main theoretical developments. We prove stability and consistency theorems 
fur niultiscale covariance tensor fic^lds under mild regularity assumptions on 
the kernel, mid also analyze rates of convergence that are importmit for ap¬ 
plications in data muUysis. Since some dis(‘ontinuous kernels are of practical 
interest, we aLso investigate convergence results for such kcunels, including a 
pointwis(^ central limit theorem. Multiscale Frechet fmictions are discuss(‘d 
in Sectionand manifold clustering in Section]^ We close with a summary 
and some disc:ussion. 
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2. Covariance Tensor Fields 


2.1. Preliminaries 

To define covariance tensor fields, we introduce some notation. Elements 
of the teiLSor product ® may be identified with bilinear forms S: x 

^ R through the Euclidean inner product. More precisely, a ])ure tensor 
X ® t/ corresponds to the bilinear form 

x<S>y{u,v) = {x,u) • (y,u), (1) 

Vu,u ^ R^. where (,) denotes Euclidean inner product. Bilinear forms as¬ 
sociated with more general elements of ® R^ can be described by linear 
extension. In Euclidean coordinates, we abuse notation and also write the 
coordinate vectors of x^y as x and y. With this convention, letting A 
be the d x d matrix A = xy^y we have 

x®y{u,v) = {u,Av), (2) 

where the superscript T denotes transposition. In this manner, using Eu¬ 
clidean coordinates, an element of R'^ ® R^ also can be identified with a dxd 
matrix by linear extension of th(^ correspondence x<S>y A. Through these 
identifications, we refer to an element S ^ ® R^ interchangeably as a 

tensor, a bilinear form or a matrix. We eejuip R^^ ® R^ with the inner product 
defined on pure tensors by 

{ 2 : 1 0 ?/i, 2:2 0 ys) = ^ 2 ) {yuy2) (3) 

and extended linearly to R'^ 0 R'^. Thus, the corresponding norm satisfies 

l|:^®y|| = Wllylh (4) 

for any x^y In matrix representation, this is the Frobenius norm. 

Throughout the paper, we view R^ as a measurable space equipped with 
the Bor el a-algebra for the Euclidean nu^tric. Let y be an R^-valued random 
variable distributed according to the probability measure a. Su])pose that // 
has expected value E [y] = /z € R'^ and finite second moment. As a motivation 
for the definition of multiscale CTFs, recall that the covariance tensor of // 
is defined as 

=E[{y-fi)<^(y- y)] = [ (y-y]®{y- /i) a{dy) e ® . (5) 
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In matrix notation 


S,.0t)=/ {y - fi){y - fif a(dy). (6) 

The bilinear form associated witli Ea(/i) clearly is symmetric and positive 
semi-definite. 

Covariation of y may be measured with respect to any x € not just /i. 
Thus, Eo(/x) may be extended to a global covariance tensor field 
® R^ given by 



(y- x)® {y 


x)a{dy). 


Note, however, tliat 


(7) 


E„(a:) = + (n - x)®{fj.-x), (8) 

for any x G Thus, for x ^ y, Sa(x) do&s not reveal any information 
about the distribution of y other than that already contained in Sa(/i). In 
contrast, as we shall see below, niultis(;ale analogues are rich in information 
about the shape of ct. 

2.2. Multiscale Covariance Tensor Fields 

We adopt the notation Ua for the volume of the unit ball in and uj^-i 
for the “surface area" of the unit sphere C d > 1. Recall that 
u)d-i = 27r^'^^/r(d/2), where r(‘) is the Gamma function, and LJd-\ = dua. 
We make the convention that = h 

Let y be an R^-valued random variable with distribution a and let K he 
a multiscale kernel; that is. a measurable function K rR'^xR^x (0,oo) —> R 
such that K(Xyy.a) ^ 0, for any x, iy € R^ and a > 0. 

Definition 1. The multiscale covariance tensor field (CTF) of y associated 
with the kernel K is the one-parameter family of tensor fields, indexed by 
fT e (0, oc), given by 

S^(x,<7):=/' {y-x)®{y-x)K{x,y,a)a{dy), (9) 

jRd 

provided that the integral converges for each x € R^ and a > 0. 


6 



Remark 1. Note tliat depends only on the proljability measure a, not 
on y. Fur this reason, we refer to Sq interchangeably as the innltiscale CTF 
of the random variables y or the probability measure cv. 

Sq(x‘, O') measures the covariation of y about x with probability mass 
at y weighted by K{x^y,a). It is simple to verify that the bilinear form 
Sq(j:, O') is symmetric and positive semi-definite. Note that if K is bomided 
for each a > 0. that is, BAfcr > 0 such that K(x^y^a) < A/^r, Vr.i/ ^ R^, 
then Sq(i, O') is well defined for any random variable y with finite second 
moment. In particular if K = 1. Sa(r:, o') = SQ(a‘), Vx € R^. However, as 
our primary goal is to study the orgmiization of data and random v^iriables 
at scales ranging from locid to global, we consider kernels in R^ that satisfy 
additional decay conditions as tliey produce a windowing effect. The kernels 
are constructed as follows. 

Definition 2. Let d be a positive integer and /: [0, co) ^ R a bounded and 
measmable function satisfying: 

(a) fir) ^ 0. Vr e [0 ,qc); 

(b) Md = r2“^/(r) dr < co; 

(c) There is C > 0 such that r/(r) < C, Vr £ [0, oo). 

The multiscale kernel K : x R^ x (0, co) —R associated witli / is defined 

as 

where Q(o') = la'^MdUJd-i- 

Condition (b) in the definition implies that thi) nonnaliziug constant 
Cd{(^) i« well defined. The normalization is adopted so that J K{x^y.a) dy = 
1. Vr € R^ and Va > 0. Condition (c) guarantees that the integral in (j^ is 
convergent for any probability meiisure q. Henceforth, for convenience, we 
assume tliat sup / = 1. This is not restrictive since scaling / does not change 
the; kernel K because of the; norniiilization. 

Whereas we investigate properties of multiscide CTFs in a more general 
setting, our examples and experiments focus on two special kernels: 
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(i) The isotropic Gaussian kernel 


which is associated with the fmiction f{x) = 

(ii) The truncation kernel 

= ( 12 ) 

associated with the characteristic function x - [0. oo) —^ R of the unit 
interval [0.1]. In measuring covariation of random variables about 
the kerned T attributes a uniform weight to mass at points within the 
closed ball of ratlins a centered at x and weight zero to imxss elsewliere. 

Remark 2. Tlic kernel K dc'fined in ( |T?)[ ) is lioniogeneoms and isotropic; tliat 
is, for any isometry ^ R^, /C(^(x),^(y),a) = K{x,y^a), Vr,y 6 R^ 

and O' > 0. Moreover, if we write <f{x) = f/r-h b, with U € 0{d) and b € R^, 
then 

= (13) 

fur any (r, a) e R*^ X (0,oo). Here 0(d) is the group of d x rf orthogonal 
matrices and v^.(cv) is the pushfunvard of a under 

Remark 3. Multiscale covariance tensor fields can be defined for any positive 
Borel measure a that satisfies 

f (14) 

not just for probability measures. In particular, if / has compact support, 
covarimice fields are defined for any locally finite Borel meiksure a: that is, 
meiisures for which every point p E R^ has an open neighborhood Up such 
that ck{Up) < CO. 

We conclude this section with examples that support our contention that 
inultiscalc covariance teiLsor fields are rich in information about the shape of 
data. 
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Example 1. This example shows that the s])ectrum of niultisc;a]e covariance 
tensors allow us to estimate the dimensionality of data in a scale de])endent 
manner. We consider the data points in R^, shown in Figure 

mid calculate centered at one of the data points for tin; Gaussian 
kernel at scales o* = 0.1 and cr = 2. Here denotes the empirical memsure 
covariance tensors are depicted as ellipses whose principal 
axes are in the direc;tion of th(^ eigenvectors of the covariance matrix and 
princi])al radii axe proportional to and \/^i where 0 < Ai < A 2 are the 
eigoimdues of the covariance. At scale a = 0.1. A 1 /A 2 =0.908, showing that 
the covariance tensor is nearly isotropic, indie;ating that th(^ ‘‘dimension” of 
the data is 2. At o* = 2, the ratio of the eigenvalues is 0.025, giving a highly 
anisotropic covmiance tensor, from which we infer that the dimeiLsion is 1 . 



a = 0.1 0 - = 2 


Figure 2: Estimating data dimensionality at different scales tlirongh miiltiscaJe covariance. 


Example 2 (A linear snbspace of R"^). Let ^ R^, 1 < r < rf, 

be ortho normal vectors and consider the subspac-e H =< ..., > that 

they span. Let o: denote the singular mc^asure sup])orted on H induced by 
the volume form on H. The me^isure a clearly is locally finite. We calculate 
imiltLsciile covariance fields at points x G H to show that H may be recovered 
from EQ(a‘, a). By Reimirkj^ we may assume that x = 0. A calculation shows 
that for the Gaussian kernel. 


Sq(0,(7) 



For the tnuioatioii kernel, 


(15) 


r 

Sq(0,<7) = (16) 

where 

1 L/ 

A. = - 5 —r— / sm^ecos^ede. (17) 
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For r = 1. tiiis expression simplifies to Ai = Thus, for both 

kernels, tlie orthogonal coinplcMnent of H is the null space of 2^,(0. fx) and H 
is the eigenspace associated with the positive eigenvalue A^. 

Example 3 {Wedge of n segments). Consider the wedge (one-point imion) U’ 
of n segments Li,.... in attached at the origin, as depicted in Fig.|^ 
Each segment Li is determined by its length > 0 and a unit direction vector 
Vj. We assume that Vi ^ Uj, for any 1 < t < j < n. Let a be the singular 
meixsure on that is supported on IF and ^igrees with the measure induced 
by arc length on each segment Li. We consider the multiscale covarimice 
fi(dd of Q associated with the trmication kernel. For :r € L*, x ^ 0, iis in 
the case r = 1 in Example we have that Eq(x, a) = {2/3o‘^~^i'd)vi ® Vi at 
small enough scales. Thus, Eq(x, ct) has rank one. However, at the origin. 


Ea{0, ( t ) 


1 


J^(min{cr, ^0)*^ 


Vi 


®Vi. 


for any o* > 0. Thus, for a < min{fi. 1 < i < n}, 


s„(0, a) 


1 


n 

Vi^Vi- 

tsL 


( 18 ) 


(19) 



Figure 3: Covariance at the one-point union of line segments. 


3. Geometry of Curves and Surfaces 

In this section, we show how niultiscale CTFs dissociated with the trun¬ 
cation kernel extract precise loccd geometric information from plane’ curves 
and surfaces in 
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3.1. Plane Curves 

Example 4. We begin with the special case of a circle. Let Ci? C be the 
circle of radius R centered at the origin in and a the singiiJar lueiisure 
supported on Cr induced by arc; length. For any x € we denote r = ||n||. 
If X is such that |r — R\ > a then Sa(x, cr) = 0. Assume that x € R^ and 
0 < a < R are such that r € [i? — fx. /? -h cr]. In this case, in the coordinate 
system given by the directions a = ic/|| 2 :|| mid t = a calculation shows 
that Ea{a‘, cr) is diagonal with entries 


An(:r.cr) = \R(p IR^ + 2r^) + i?^(i?cos0 — 4r) sin^l 

( 20 ) 

A<(ar. cr) = — ^{(p - sin ^COS 0), 

7Ta 

where <p = arccos ^)• Thus, the normal and tangential vectors, n 
and i. are eigenvectors with eigenvalues A^ and A^, respectively. Fig. shows 
the eigenvalues as functions of r, 0.9 < r < 1.1, for a = 0.1. 



Figure 4: Tangential (blue) and normal (red) eigenvalues as a function of r, 0.9 < r < 1,1. 
at a = 0.1, of the nniltiscale CTF associated with the tnmeation kernel for the singular 
measure induced by arc length, supported on the unit circle in R^. 


Now we consider a general smooth curve C C R^, tliat is, a 1-dimensional, 
smooth, properly embedded submanifold of R^. Let a be the singular mea¬ 
sure on R^ supported on C and induced by arc length. This measure is 
locally finite because tlie embedding is proper. We calculate the sniall-sc;a]c 
covarimice at points on C for the truncation kerind and show that the curva¬ 
ture can be recovered from the eigenvalues of Let x ^ C ha fix(;d. Tlie 
arc-lengtli parmnetrization of C iioiir x may be written as 

.3 






X{s) = s --—h and i"(s) = -r—h 




0 (. 


( 21 ) 
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where -^( 5 ) and Y{s) are coordinates along the tangent and normal to C at 
X, respectively [321 • Here, the curvature k and its derivative are evaluated 
at X. A calculation yields: 


Proposition 1. Let cr > 0 sraall. IfC is a smooth plane curve andx G C, 
then in the coordinates specified above we have 


a) = 


'i<j 
- i ^TT 


29 ^ 


fO(a‘ 

O(,o*) 




0[p* 

0{o\ 


( 22 ) 


^ 15fr ' ^ \ f IOtt 

Proposition implies that, for a > 0 small, the eigenvalues of are 
2(1 


•^1 — - 


Stt 207r 


+ 0{a ) and A 2 = 


K^G^ 

lo^ 


0{c^), 


SO 


that 


tr Erv(a;. a) = 


2cr 


0[a^). 


Stt 207r 

Thus, the curvature at a* G C may be recovered, up to a sign, as 

1/2 


, - 2ff\ 


(23) 

(24) 

(25) 


5.2. Surfaces in 

Example 5. Let Sr be the sphere of ratlins R centered at the origin in R^. 
For X G R^, we let r = ||a:||. If x is such that |r — i?| > a, then T,c^{x,g) = 0. 
Assume that x 7 ^ (0,0,0) and g > 0 are such that r G [i? — o', /? -h cr]. In 
the coordinate system given by the vector n = 2 :/||i||, and any orthonormal 
basis {^ 1 .^ 2 } tbe orthogonal complement a direct calculation shows 
that Eti(x, O') is a 3 X 3 diagonal matrix with entries 


A,i(x,a-) = Ai,(i,o-) = (cos0 + 2) 

d2 

Xrii^.a) = “ COS0) {R^ -h /2cos^(i?cos0 + R — 3r)) (26) 

p2 

+ ^(l-cos<^))(-3i?r + 3r^) 

where (j) = arccos j ♦ Hi particular, this means that An ii^ the ein- 

genvalue corresponding to the eigenvector a along the normal direction to 
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Q 


Figure 5: Tangential (blue) and normal eigenvalues (red) of the multiscale CTF associated 
with the truncation kernel as a function of?', 0.9 <r < 1.1, at tr = 0.1. for the singulai* 
measure induced by surface area, supported on the unit .sphere in R^. 


the sphere at x/||a:||, cuicl spmi the eigenapace along the tangent di¬ 
rections with eigenvalue Fig.[^ shows a plot of the eigenvalues as 

a function of r. 0.9 < r < 1,1. for o* = 0.1. 

Now we consider a gener^il smooth compact surface 5 C Let a be the 
singular measure on supported on S and induced by the area measure on 
S. We calc:ulate the small-scale c;ovarianc:e at points on 5 for the truncation 
kernel and show that the princi])al curvatures may indeed be recovered from 
the; spectrum of E. Given a non-iunbilic point p ^ S. one can choose a 
Cartesian coordinate system centered at p so that the x-axis is along the 
direction of nuixinial curvature at p, the y-axis is along the direction of 
minimal curvature at p, and the 2 -axis is along the normal to S at p. 


Proposition 2. Let a > 0 be small, p ^ S be non-umbilic, and a be the 
suT'face arm measure on S. In the coordinate system described above, the 
covariance tensor for the truncation kernel is given by 




A. 0{a*) 0(a^)' 
0(a^) A,, O(a^) , 

0(a^) O(a^) 


(27) 


where 
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(28) 


.4, - JZ + - 6kiK2 + 

lb 25b 

.4,3 = YF + “ 6/«i« 2 - 3Ki)o-^ + 0(<7‘*), 

lb 25b 
128 

and Ki > K 2 tire the p7incipal curvatures of S at p. 


It follows from this result that, for a > 0 small, 

3 1 

trSQ(/).o-) = —a + ^(«i - K 2 )^o-^ 4-0 {ct‘‘) and 

2 (29) 

det Eq(p. (t) = (Swf + 2 kiK 2 + Sk^) ^ri + 0{ct^^) . 

2(J4S 

As a consequence, ki and can be recovered from the spectrum of Ea(p, cr) 
as a function of a. Indeed, from the small scale asymptotics of tlie trace 
and determinant of Eq(p, a), we can extract the values of {ki — 

3ki + 2 ki/C 2 + 3k 2 from which we can determine the values of «i mid K 2 - 

Proof of Proposition^^ Using cylindrical coordinates in the chosen reference 
system, we can parametrize the patch 5ni?(p. a) as (pcos0, psin^, 2 (p, c^)), 
for ^ e [0,27r], p e [O,Pa(0)], wherep^(^) = o--|(Ki(cus^)^+K2(‘^hi0)^)^o'^H- 
mid 2 (p, 0) = y(Ki(cos^)^ -h « 2 (sin^)^) + 0(c^^). The area element 
on the patch is given by 

dA= + ^(ki(cos^)^ + K2(sin^)^) + 0(p^)^ (ipt/0. (30) 


Now we have all the ingredients needed to compute Eq(p, a). For example, 
to calculate the (1. l)-entry, we express JJsnB{Qa) ^ 


/•27r fPt,{4>) 

Jo Jo 


' ens^ 




cos^ 


which after a simple but tedious calculation yields the desired result. Tlie 
computation of other entries of the matrix follows similar steps. □ 
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4. Stability and Consistency 

For each p £ [1. co), let denote the collection of all Borcl proba¬ 

bility measures a on whose pth moment Mp(a) = J || 2 ||Pa(d 2 ) is finite. 
We iidopt the notation mp(a) = A/p^^(a). For p = oo, we let be the 

collection of all Borel probability measures on R^ with boimcled support mid 
= sup{|| 2 ||, 2 6 supp [q]}. By Jensen s inequality, if 1 < < p < oo, 

then 'Pp(R'^) C and m^(cv) < mp(a), for any a € Pp(R^). 

Definition 3. For p 6 [1. oo] and A > 0, we define (R^) C 7’p(R^) as the 
subset of all n: ^ 'Pp(R‘^) such that a(A) < XC{A)^ for all measurable sets A, 
where C stands for Lebesgue measure. 

Example 6. If a 6 Pp(R*^) is absolutely contimiuns with respect to the 
Lebesgue measure with density function / e L^(R*^) satisfying ||/||og < A, 
then cv € (R^). 

Let us recall the definition of the p-Wiisserstein distance U p{a, between 
cv, 5 ^ 'Pp(R^). Let r(a, 0) be the collection of all couplings of q and 0\ that 
is, probability measures p on x such that (tti).// = a and (Trs)*.^ = 0, 
where TTi.it-}: R^ x R^ —^ R^ denote projections onto the first and second 
c oin ponent s, res])ect ivcJy. 

Definition 4. For p € [l,co), the p-Wasserstein dv>tance between a,0 ^ 
Pp(R^) is given by 

Up(a,/3) := inf ( [[ ||2i - Z2\\^li{dzi X ^ 22 )^ 

fi€r{a,d) \JJ ) 

and the 00 -Wasserstein distance between a, ^ by 

1Fcxd(«,; 5) := inf 8 up{|| 2 i - 22 11, ( 2 ^ 22 ) € snpp[p]}. 

*isr(Q,5) 

Remark 4. 


(i) For any a, ^ ^ 'Pp(R^), p € [1. co], there exists a coupling that realizes 
the infimuin in the definition of Up (a, 0} (cf. m)- 


(ii) It is a standard result that, for each p € [l.oo), Up defines a metric 
on T’p(R*^) that is compatible with weak convergence of probability 
ineasuies [^. 

(hi) If (p: R^ — ^ R^ is an isometry, then Up(a, = Up((p.(Q), (p.(^)), for 
any a,0 G 'Pp{R‘^). 
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4-1- Stnooih Kernels 

Theorem 1 (Stability for Smooth Kcrnelsa). Let f : [0. co) R be as in 
Definition with multiscale kernel K. Suppose that f is differentiable and 
there exists a constant A\ > 0 such that r'^^'^\f{r)\ < Ai. Vr > 0. Then, 
there is a constant Af >0, that depends only on f, such that 

sup \\E^{x,a) -T, 0 {x,a)\\ < 

for any a, ^ 6 Vi{K^) and any a > 0. Here || > || is the norm associated with 
the inner product defined in (j^. 

Theorem shows that inultiscale covariance fields yield a robust repre¬ 
sentation of probability ineiisuics tliat make their geometric properties more 
rOfulily iiccessible, as illustrated in our examples. In Section we show tliat 
not only is stable, but all the information contained in the probabil¬ 

ity measure a is fully absorbed into the inultiscale CTF associated with the 
Gaussian kernel. In fac;t, a may be recovered from the nmltiscrale scalar field 
given by Va{x, a) = trEQ(i;, a), r 6 and fr > 0. 

Thv following lemma will be used in the proof of the stability theorem 
for smooth kernels. To simplify notation, we define : R^ ^ R by 

(32) 

and Q^ : R^ R^®R^ by 

QM) = {2®z)K^{z). (33) 

Lemma 1. Let f be as in Definition^ and suppose that f is differentiable 
and there is a constant Ai > (i such that |/Xr)| < Ai, Vr > 0. Then, 
there is a constant A/ > 0, that depends only on f, such that 

WQA^i) - QAz-2)\\ < 

for any 2i, ^2 ^ (^^d all a > Q. 

Proof Let z(t) = tzi (I — t)z 2 y 0 < t < 1. Then, 

Qc(2i)-Q«(22) = ^ dt . (34) 
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Since 




it follows that 


(zi - 22 )® 2 

C,{a) 

o 

2 ® z) 


/ 


a‘ 


Z®(z 1 - 22 ) ^ 


Ci{ 


a 


O' 


a^Cdia, 


r 


(35) 


O' 




< 


+ 


2 || 


/ 


C^{o. 

2 ||z|p’ 

2(7 II 


O' 


{z- ( 21 - 22 )) . 


| 2 l - 22 II 


/' 


/ 


O' 


1 ^ 11 ' 


+ 


Cd(o-) o \ O' 

f'(\ 

Q(a) ^ V 


llzi - 22 


lki-22ll 


(36) 


O' 


||2l-22 


Since / is smooth, condition (c) of Definition ensures that there is a con- 
stmit -42 > 0 such that Vr/(r) < -42, Vr > 0. Moreover, by hypothesis 
r^^^\f{r)\ < Ai. Thus, ((3^ implies that 




< 


oAf 


Cal 


o 


l| 22 - 2 i||, 


where Af = 2{Ai -h A 2 ). The lemma follows from (|^ and 


(37) 


□ 


Proof of Theorem^I^ Without lass of gcMierality, we may assume that ar = 0. 
We express the covariance fields as 

E„(a;,(7) = j Qa{zi) Ci{dzi) aiid S^(3:,£7) = J Q„(z 2 ) ^{dz- 2 )- (38) 

Given // > 0 satisfying 11 i(q,^) < //, let /i € r(a, be a coupling such that 

jj llzi - Z2\\ii{dzi X dz2) < T/. (39) 

We may write 


S„(x,o-)=y Q^{zi)ii{dz-iXdz2) Mid T,p{x,a) = J Q^lz-^)/lidzi x dz^ 


(40) 
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Tliercfore. 


||E„(3:,o-)-E g(2:,(7)|| < JJ ||Q„(2i) - ^^( 22 )!!X ■ 

Lemma [^aiid ( [4T| ) imply that 

||Eo(x,a) - Sa(i,CT)|| < JJ Ni - 2all X ^-^ 2 ) 

aAf 

Since holds for any ?/ > Wi{a.^)^ can conclude that 


E<,(x,a}-Sfl(x,<T)|| < 


G Aj 


C,{ 


a 




as claimed. 


(41) 


(42) 


(43) 

□ 


In what follows, given random vec;tors ^ R^, t € N, we let cvn = 

Er=i^yi/^- 

Corollary 1 (Consistency for Smooth Kernels). LetK be a rnultiscale kernel 
as in TheoTem^ and a > 0. //a € 'Pi(R^) and y,- E R^, i € N, ore i.i.d. 
random variables with disUibution (x, then 


sup ||E„Jx,a)-S.(x.a)l|1^0 


almost surely. 

Proof. Theorem [^implies that 

Slip ||Eo„(x,ct) - E„(x,£t)|| < H'’i(a„,a). (44) 

zsR'^ Ci(a) 

The conclnsion follows from the fact that U'l metrizes weak convergence 
of probability measures in 'Pi(R‘^) and Varadarajan's Theorem [35] about 
convergence of empirical measures on Polish spaces that ensures that 
converges weakly to a almost surely. □ 


18 







Corollary guarantees the asymptotic; consistency of empirical CTFa. 
However, in a])plicatioiLs. it is ini])ortant to have estimates of the rate of 
c;Olivergence, which we derive from the stability theorem and a result of 
Fournier and Guillin Theorem 1]. 

Theorem 2 {Fournier and Guillin. |5]). Let a € whej-e s > 1. If 

yj... ..Tjn are i.i.d. random variables with distribution a and p € [1, s), then 
there exists a constant b > Q that depends only on p, s and d such that 


E[WJa.a,)]< 




a-v _i 

- + n 2 

a-p _1 

« -j- n 2 

a-p _ p 

“5“ + n a 


log(l 


n 


if p> d/2 and. s ^ 2 p; 
if p = d/2 and s ^ 2p; 
if p ^ [1, d/2) and s ^ d/{d — p). 


for any n > 1. 

Corollary 2. Let f be as in Theorem^ and a > 0. Suppose that a € 'p 3 (R^) 
and yi, i are i.i.d. random oariahles with distribution a. Then, there is 
a constant h> 0, that depends only on d, such that 


if d = 1: 
if d = 2; 
if d > 3. 


E 


sup 




< 


aAjb 

-m3(n!) 




n 


n 2 log(l + /i‘ 


1 

rra 


Proof. Since C a is also an element of 7^i(R^). The conclu¬ 

sion follows by invoking Tlu^rem mid the result of Fournier mid Guillin 
witli p = 1 mid s = 5. The constmit h depends only on d because we are 
fixing p and s. □ 

Tinkeremensures that nniltiscale covariance fields arc^ stabler. However, 
the results do not ap])ly to some disc;ontinuous kernels of practical interest. 
Nonetlieless, we prove a stability theorem for the trmication kernel, as well 
as point wise convergence results for more general kernels. 
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4 >2. The Tiiincation Kernel 

We begin oin discussion of covaiiance fields associated the truncation 

kernel with a stability theorem with respect to the oo-Wiisserstein metric. In 
prepixration for the; proof of the theorem, we introduce some notation. For 
0 < u < 6, let 

i?d(a, b) = {yeM!^: a< ||y|| < b) . (45) 


and 


Sd{a.b)= [ 
Jh 


dy 


fld(a.b) 


(46) 


be its tfUfial moment of inertia. We will use the fac't that for any B > b. the 
inequality 


Sd{(i,b) < {b 


a) 


d -h 2 B — a 


(47) 


holds. Indeed. 


.V(a, 6) = (i,« _ ((i/a)" - l) 




^<-'‘-‘ \d+2) B-a S 


^B^ 
d + 2 B — a 


(48) 


Let T’ao(IR‘^) and be as in Definition 

Theorem 3 (Stability for the Truncation Kernel). Let a > 0 and X > 0. 
Suppose Q is a compact set and letoO satisfy diam(ri) < c. There is 
a constant A = A{a^ d, c) > 0 suck that if a ^ and ^ have 

their supports contained in Q, then the multiscale covariance tensor fields of 
Q and ^ associated with the truncation kemelT satisfy 


sup ||E^(x,ct) - Sa(j;,o-)|| < Aylir^(a,/3). 

xER^* 


Proof. Without loss of generality, we may assume that 2 : ^ is the origin. 
We abbreviate rj = lFao{ck, and let fi € r(,S, a) be a coupling that realizes 
q. Clearly, r/ < diam(n) < c. 
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Using the notation introdiioecl in ([4^, let = R^{a,G 7}). We write 


E„(:i:,o-) - S^(x,£7) = So{x,o-) - E„(i, cr + t;) 

Ti 

-V 

T2 


(49) 


Using the fact that ci € P^{1R®), we buniid the iionn of T\ as follows: 

y ® yda(y) < 

A. 


Il^ill =-J-|| / y®yd(y{y) <-]-/ 


dct{y) 


< 


— f 


X 


(50) 


dy= -^^da{a,a + T]). 


Using ( |47| with a = cT,b = a + 'r] and B = a + c, we have that 

..\dS-2 


, , , , ^d-i (o-+ d 

Sd((7, O’ + 77 ) < 7? 3 -—- = !) Ud 


d + 2 


f/ + 2 


Thus. 




(51) 


(52) 


rf -h 2 6(7^ 

Now we examine T 2 . Let I: ^ R the characteristic fuiu'tion of the closed 

ball of radius 1 c:entered at the origin. Using the coupling ;z, we write 


T> = 


]— [[ (yi ® yi)! - (y2®y2)if—) 

J VCT + 77/ \cr/ 


X rfj/2). 

(53) 

Note that the integrand in ( |5^ vanishes on (R^ \ ^ct+t,) x (R^ \ Scr) mid 
the integral also vanishes over(R^ \ x Scr becaiLse this subdoinmn is 

disjoint from supp [^i]. Combining these remarks with 


yi ® ?/i - ^2 ® 2/2 = (Vi - ^2) ® 2/1 + 2/2 ® (2/1 - .^/2) 1 


(54) 
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we may rc^-xite (|^ as 


T2 = 


I is 

^JL 

o-'^t'd IIb 

^JL 

^JL 


[(yi ® yi) - iy2 ® ys)] (i{dyi x 




yi'S'yi fi{dyi X dyi) 

(j/1 - ys) 0 yi y(«^yi x dy^) 


(55) 




yi ® (yi - y2)y(t^yi x dy^) 




yi ® yi /'((^yi x (iy2) • 


For the last equiility, we iLsed again the fact that 

(yi, y2) e supp [/t] ^ ||yi - y2|| < y. 


(56) 


FVom (55 1 and (56K using the hu^ts that ||yi|| < cr + r/ fur yi ^ 5^+;? mid 
< (j for y 2 € Sc we may conchide tliat 


211 < 


H- 


[[ ||yi - y2|| y{dyi X dy2) 

^ J JBs^T,y Bo 

Wvi - y2\\ y(dyi X dy2) 
rrfl \\yif y{dyi X dy 2 ) 


a 


a + c 
< n 


(57) 




O^U, 


+ 


11 


/ a{dyi) + -^T] a{dyi) 
Jb,+^ ° ‘^d Jb,^.^ 


lyill 0({dyi). 


where (g — 77)+ = max{a — 7. 0 }. Since a ^ and 7? < c, 

f ctidyi) ^ ^ f dyi = \{a + ylud < + c)‘^i'd 


(58) 
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aiid 


f llyif a(dj/i) <\f 

JA 


(iyi 




= A*rf((a - 77)+,o- + 77) 

{a + c)‘'+^ 


< 2A/J 


(59) 


d 2 Tj c 


“ c 

where we used (|47l ) with a = (a — 6 = a + r/ and B = a -\-c. Combining 

and {ISOS, we obtain 


m\\< 


2a + c., 2A d {a + 


a' 


\{a + c)‘^T] + 


a‘‘d + 2 c 


V 


(60) 


PYom (49l. (52i mid iGOi, it follows that 


||S„(:r,i7) - E^(:r, 17)11 < ^) 


a‘ 


+ A 


2d {a -h c)^+2 


(61) 


[d + 2) ccr*^ 

= AA(a,d,c)H^^(Q,^). 

Since 3 : € is arbitrary, the claim follows. 


vroc(a,^) 


□ 


We now derive a consistency result mid estimates for tlie rate of con¬ 
vergence of empirical approximations to nmltiscide covariance fields. TIk; 
following result is a IV’^-eounterpmt to the theorem by Fournier mid Gnillin 
stated above. 


Theorem 4 (Garcia-TVillos and Slepcev |S]). Let Q C be a bounded 
connected open subset with Lips chit z boundary. Let a be a probability measure 
on n with density f^ : Q ^ (0, 00 ) suck that there exists A > 1 with A“^ < 
/q(^‘) ^ A, for all X ^ rt, and let y^... ,yji be i.i.d. random variables with 
distribution q. Then, there exist constants cj.Ci^C^ > 0. depending only on 
n and X, suck that for all n € N and p > 1. 

P(iroc(a,a.) < (Cl + C 2 VP) rd{n)) > 1 - em-P. 
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where r 2 (?i) = , for > 3. 

Corollary 3 (Consistency for the Truiication Kernel). Let a be a probabilty 
measure on with density fa and let be the inteiior of the support of 
a. Assume that bounded and connected with Lipschitz boundary 50^- 

Fmiheimore. assume that there exists A > 1 such that < /^(s) < A, for 
all z e fia- UViy z ^ N, are i.i.d. random vaiiahles with distiihution a, then, 
for any p> I, there ai'e constants C = 0(^0. A,p) > 0 and ci = Ci(no, A) > 
0 such that 

P ( Slip ||S„Ji,a) -E„(i,ct)|| < Crrf(n)'] > 1 -CiR-p. 

Her'e, a^ = YA=iK/'^‘- 

Proof We use Theorem and write C = Ci + .ypC 2 - Theorem implies 
that there is a constant C' = C"(Oq, A) > 0 such that 

Slip ||SoJx,a) < C"ir^(a,fv„)- (62) 

Thus, 

Pf Slip ||So„(x,a)-E„(x,o)|| <C'C%(n)') > 

^ leE" > (63) 

> an) < C'rd(n)) > 1 - em-". 

The claim follows by setting C = C‘C". □ 

Corollary 4. Let a > 0 and p > 1. Under the assumptions of Cowllari/^ 
for the truncation kernel, there exist N = A^{(7, n„,A) € N and a constant 
A = A{a, Qq, A) > 0 such that 


for all n > N. 


E 


Slip ||E^„(x,a}-E„(x,cT)|| 

.a:€R^ 


< -4ri(n), 


Proof. We apply to ll'’oo(cv,an) the identity E{Z) = J^P(Z > t)dt that is 
valid for any non-negative random variable Z with finite first moment. Since 
iron(a, a„) < D = diam(fi„), we get 



P(Woo(nra7i) > t) dt. 


(64) 
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(65) 


Tlieoreiii Q] implies that 

P(U'-^(a,a„) > (Ci + v5C2)r,(n)) < 


Let to = miii{D. (Ci -h y^Ca) rti(n)}. From (Q aiid 

E[lt^^(a.aJ]= rP(UUa,a,)>t)Jt+ ^ F {W^{a.a,) > t) dt 

Jl) Jto 

< to + DP (lFoo(a, «n) > (Cl -h VpCi) rd(n)) 

<(Cl + v^C2)rd(n) + D;^^ 


( 66 ) 


Fixing /), say p = 2, for /I sufficiently large, the dominant term on this last 
expression is the one involving rfi(n). Thus, the claim follows from ( |6^ and 
Theorem applied to a and ^ □ 

Remark 5. We carry out an ex])eriment to test the convergence rates ob¬ 
tained in Corollary |4[ We consider the probability moiisure a supported 
on the unit circle C induced by the normalized arc length element 
(27r)- In this case, for the truncation kernel, Eq was calculated explicitly 
in Example [2 We consider sets of i.i.d. samples of size n, 10 < n < 10^. 
For each ri, thirty sets of smuples are taken. For each such set, we compute 
Sq^ and estimate the “error’’ as nuix ||Sq„(x, a) — Su( 2 :,cr)||, for a = 0.6, 
where the nuiximum is taken over gridpoints on a 24 x 24 grid on the square 
[—1.5,1.5] X [—1.5,1.5]. We let Sn be the average error over all thirty sets of 
sam])les. Figure* shows a plot (in blue) of Sn in log-log scale. To compare 
with the predicated rates, we tise a least-squares fit. in log-log scale, of 
the form e = Cr 2 (n) = ^l&^o shown in Figure (in red). The dis¬ 

crepancy between the predicted and observed rates suggests that Corollary 
might not be optimal. A curve of the form e = shown in green, 

produces a tighter fit to the data, suggesting that the optimal bound might 
be 0 (ri-V 2 ). 


4>9. General Kernels 

Wc conclude the discussion of convergence of empiricid CTFs with a 
point wise central limit theorem (CLT) that holds for kerneLs in the full gen¬ 
erality of Definition One may think of it its a CLT for each entry of the 
matrix EQ(r, a). If ci,..., is an orthonormal basis of the (z, j)-entry of 
the covariance matrix in this coordinate system is given by Sa(r,cr)(ei, e^). 
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Figure 6: Log-log plots of experimental error rates (in blue) for empirical covariance fields, 
rates predicted by Corollaiy [^(iii red), and a least-squares fit of order (in green). 


the bilinear form Sq( 2 ‘, a) evaluated at In matrix notation, this is 

the same as (e^, Sq(x, o')ej). More generally, for fixed u^v.x and o* > 0, 
we consider 



(y - i) 0 {y 


x){u,v)K(x,y.a)a{dy). 


(67) 


Consider the random variable 


-ur(y) = {y-x)ig) {y-x){u,v)K(x,y,a), (68) 

where y has distribution a. Clearly, 


E [z^y] = Sa(x, a){% v). (69) 

Theorem 5 (Central Limit). If f is as in Definition^ then has finite 
vanance Moreover, if zi, i ^ N, are i.i.d. 7undo7n variables with the 
same distribution as then 



Eq(x, c){u, v] 


47V'{0,c7 


as n —>• oo, where converyence is in distribution and A^(0, cr^^) is normally 
distributed with mean zero and variance 
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Proof. We t^how that has finite second inonieiit. From (|(^ and (Q. 



< 

< 


mil t; 




o’^ll'“llll^ll [\\y- 

Clia) J 


-"‘f («»(.») 


( 70 ) 


CK^) 



The last inequality follows from condition (c) in Definition that ensures 
that r^/^(r) < C^, for any r > 0. The theorem now follows from a direct 
application of the classical CLT. □ 


Remark 6. Note that (|70[) implies that if ||ii|| = \\v\\ = 1, then 




(71) 


giving a miifonn bound on the variance of over x 6 and n, i? € ^ 


5. Multiscalc Frcchct Functions 

The inemi of a rmidoin vector ?/ € is a simple and yet oftentimes infor¬ 
mative. "one-element" smnmary of the distribution of y. If y has finite second 
moment and is distributed ac: cor ding to the probability measure a, then the 
mean may be charac:terized more geometrically as the unique minimizer of 
the Frechet function 

Jv(x)=E[||;y-i||^] = j ||;y-:r||^«(d3/), (72) 

which measures the spread of y about i The mean, however, is not as 

effective for complex di.stri but ions of practical interest such ns multimodal 
distributions or those supported in nonlinear subspaces. In this section, 
we introdiK'C a multiscale analogue of the FVechet function that is rich in 
information about the shape of the distribution of y. At each fixed scale, 
the local minima of the function may be viewed as localized analogues of the 
mean, as illustrated in examples below. However, instead of just focusing on 
the local extrema, we take the view that it is more informative to investigate 
the; behavior of the; full multiscale Frechet function, as this lets us uncover 
more information about the distribution of y. 
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Definition 5. Lot f: [0, oo) —> R be as in Definitionassociated kernel 
K : R^xR^x(0, co) R. The multiscale Frechet functionVa : R^x(0, oo) 

R is dc^fiiied as 

V^{x,a) := J A'(i,y,o-)fv(rfi/). 

Proposition 3. For each a > 0,, the multiscale Frecket function satisfies 


Va{x,a) = ttHaix, a). 

Pjvof. Let {^ 1 ,.... Cd} C R^ be mi ortliononual basis. Then, 

(f d 

lly - ^11^ eO . (73) 


tsl 


tsl 


Hence. 


VJx.a) = Yl / (y - x) <Si (y - x){ei,ei)K(x.y.a) a(dy) 

tsl 

d 

= ^ cT){ei, Ci) = tr a), 


tsl 


as claimed. 


(74) 


□ 


Corollary 5 (Stability). Let f: [0, oo) —^ R he as in Definition^ with mul¬ 
tiscale kernel K. Suppose that f is differentiable and there exists a constant 
A > 0 such that 1/^(7')| < A. Vr > 0. Then, there is a constant Af > 0, 
that depends only on f. such that 


sup |K.(a:,cT) - V'^(x,ct)| < 




for any T’i(R‘^) and any a > 0. 


Proof. The result follows from Proposition Theorem and the fact that 
for any d x d matrix X. \trX\ < d||X||, where ||X|| is the Frobeuius norm 
of A'. □ 
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Similarly, CurolUirx' and Proposition yield the following consisatency 
result for iniiltisoale Frechet functions. 


Corollary 6 (Com^hstency). Suppose that a £ Pi(R‘^). Let i € N. 

be i.i.d. random vmiables with distribution n and K a multiscale kernel as 
in Theorem,Then, for each fixed a > 0, 

sup |v;(2-,ct) - v;„(i'.o-)l 0 

rSR*^ 

almost surely. 

Tlu' following result about convergence of niultisciile FVecliet functions 
are ininiediate consequences of Corolhiry and Cor{)llai'>' 

Corollary 7. Let f be as in Theorern^and a > 0. Suppose that a € 'P's{R^) 
and Pi, i are i.i.d. random oui'iables with distnbution a. Then, there is 
a constant ^ > 0, that depends only on d, such that 


E 


sup 114 ( 1 , ct) 

.X6R*' 




< 


adAjfi 

'cM 


m 3 (a) 


n 


n 


11 


2 1 

3 + /I 2 


3 H- a 


log{l + n) 


2 L 

3 + /I 3 


ifd=l: 
if d = 2: 
if d > 3. 


Remark 7. Analogous stability mid consistency results for the truncation 
kernel fallow from Theorem Corollary and Corollary 

For more generid kernels, the following point wise central limit theorem 
holds. For fixed a* € and a > 0. let 


«(y) = ll?/-2:fA'{i.i/.cr). (75) 

whose expected value is E [t] = a). As in Theorem the variance of t 

is finite and denoted <j'f. 

Theorem 6 (Central Limit). Let f be as in Definition^ If U € R, i € N, 
aie i.i.d. jundom i;onaWe5 with the same distribution as t, then 




^<i-v;(x.a) j A-V(0,<Tt^), 


isL 


as n —¥ oc, where convergence is in distribution and is normally 

distributed with mean zero and variance of. 
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Multisoale Freohet functions not only give stable representations of prob¬ 
ability measures, but any probability measure a may be fully recovered from 
its niultiscale Frecliet function associated with the Gaussian kernel, as the 
following result shows. 

Proposition 4. Let a > i) be fixed. Any probability measme a is completely 
deterniined by the Frechet function Vq(*,o') associated with the Gaussian ker¬ 
nel at scale a. 

Pivof. Let /icr: R be given by 

(76) 

Then, for the Gaussian kernel, we may e:xpress the uniltiscale Ftechet ftmctioii 
as the convolution Vq{x^ a) = (/i^ * <^)(^)* Under Fourier transform, for each 
fixed O' > 0, we obtain 


Z{i,a) = hAO^^{-2nO, 


(77) 


where is the characteristic function of a defined as 0a(O = / a(dx). 
Therefore, 


<Pc{0 = Va{-mn,G)/hA-^/27r) 
provided that ha{—^/27r)^Q. A calculation shows that 




a 




o‘ 


a 


TT 


exp 




27r 


(78) 


(79) 


which only vmiLshes at points f on the sphere of radius = yfj^jcF about 
the origin. Thus, ( |7^ implies that we can recover 4>a{0 from K,(',(7), if 
lien ^ a. By continuity, we can recover for any f. The claim 

now follows from the fact that the characteristic fimction (pa determines a 

[ 22 ]. □ 

The following examples illustrate how information about the shape of 
data can be extracted from mult is tide Frechet functions. 

Example 7. We consider n = 400 data points distributed into two clusters 
of 200 points, each smnpled from a Gaussian of vmiance 0.36 centered at 
different points. The data points are plotted in blue in Fig.j^a), which 
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also a hows the empirical Frechct function at ac:a]e a = 3. The local 
minima of V„ captures wliat is j^erceived aa tlie “centers'’ of the two clusters 
at that sc;ale. However, more information about the data distribution can 
be uncovered from V^. For exmnple, the locixl minima may be viewed as 
attractors of the (negative) griidiont field — VV'n, indicated by the arrows in 
the; figure. The stable manifold of each attractor, which comprises points 
that move toward the attractors under the associated flow may be viewed 
as cliLSters inferred from the data at that scale. These clusters are delimited 
by the repellers of the system, which correspond to the loc;a] m^ixima of V„. 
Fig.j^b) shows how Vn varies across scales, highlighting the bifurcation of 
the; attractors (in red) and repellers (in green) as a changes. In data analysis. 



Figure 7: (a) PiKhet function for data on the line (highlighted in blue) computed with 
the Gaussian kernel at scale <7 = 3: (b) Frechet function across scales. 

such bifurcation diiigrmiLs may find severed applications. For exmnple, if the; 
data represent the distribution of some plienotypic trait for two species tliat 
have evolved from a single group, the niultisctde Frechet fun(“tion mid tlie 
associated bifurcation diagram let ns create an evolutionary mock?l for the 
trait from the observed data. 

Example 8. Here wo consider the dataset in sliown in pmiel (a) of Fig. 
PaiKjls (b)-(h) sliow the FVechet function for the Gaussian kern(;l calculated 
at increasing scales. The gradient field — VKi at scale a = 2.25 is depicted 
in panel (a) of Fig.[^ along with the two attractors pi and <^iid their 
stable manifolds that were estimated nmnerically. Tlie stable manifolds 
may be viewed iis estimations at scide a = 2.25 of clusters of the underlying 
probability ineiisure a from which the data wms smnpled. Panel (b) shows 
the gratlient field mid tlie oovmiance tensors at the attractors de])icted as 
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Figure 8: Heat maps of the miiltiscaJe PiKhet function for 2D data at increasing scales 
computed with the Gaussian kernel. 





(b) 


Figure 9: (a) 2D data, attractors and their stable manifolds at a fixed scale ((7 = 2.25): 
(b) gradient vector field and covaiiance tensors at the attractors. 


ellipses with principal radii proportional to the square root of the eigenvalues 
of the oov^iriance matrix. This may be viewed as a locidized analogue of 
principal eoinponent analysis {PCA) that is able to uncover geometry tliat 
is not detectable with stmiclarcl PC A. Analysis of the spectra of Sn(pi,(7), 
i 1.2, suggests that the data is organized around two one-dimensional 
clusters, whereas standard PC A is not sensitive to the* local dimensionality 
because of tlie orientation of the clusters. 
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Th(we examples are intended as proof-of-coucept illustrations. Topolog¬ 
ical and other methods will be explored in forthcoming work for extraction 
of structural information from K,. 

6. Hierarchical Manifold Clustering 

Clustering is a centred theme in pattern analysis with a rich history; cf. 
|3^ . One of the most studied forms of the problem is that of piirtitioning a 
dataset into various subsets if there is some form of spatial se]}mation of the 
data into subgrou])s. Motivated by problems in such areas ns computer vision 
and video analysis, cf. |^. there has been a growing interest in clustering 
data that are organized as a finite union of passibly intersecting subspac;es 
that have some special geometric structure [2S1 [IH1I2S]* As illustrated in 
Fig.[^ the data may consist of noLsy smnples from mi mraiigenieut of (affine) 
linear subspaces of a Euclidean space such as a collection of lines in a plain', 
or an mrangenient of lines and planes in More generally, the clusters may 
comprise a finite collection of possibly non-linear, smooth submanifolds of a 
Euclidean space that intersec;t transversely. Here we propose an approach 
to manifold clustering based on CTFs. The basic idea is to nsc^ covariance 
ficdcLs to incorporate directional information at each data point. Formally, 
this is achieved via a section of the tensor bundle x (R^ 0 R^) over as 
follows. Given a probability meiisnre q mid a inultLscale kernel, let cr) 
be the associated CTF. For each a > 0. consider the section R^ —> 
R^ X (R^ 0 R^) given by x Sq( 2 :, o*)). On the total space of the tensor 
bundle', defines thc^ metric 

||(:^■, S) - S')||, = (||S - S'f + 7^11^ - . (80) 

where x,x' E R*^, E, E' € 0 R^, mid 7 > 0 is a parmneter that balances 

the contributions of the spatial and tensor components. Note that || • ||q only 
defines a pseudo-metric since || • ||o disregards “horizontal" distances. 

For any subset -Y C R'^, we denote by Xqj^^ct the metric space 
where 

• ( 81 ) 

For a dataset A = {ai .C R^, the proposed clustering method is 

bmscKl on the single-linkage method applied to the finite* metric space 
Bssociatcd with the empirical measure Ecpiiva¬ 

lent ly, clustering is based on the n x a affinity matrix D whose (i,j)-entry 
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is 

dij = • (82) 

Recall tliat single linkage on a finite metric space A = {A,dA) st^irts from n 
clusters, eaeli a singleton {«»}. 1 ^ i ^ n, sequentially merging tlie closest 
clusters until {ill data points coalesce into a single cluster. Closeness of two 
clusters, say Ai,A 2 C A, is ineasuied by the inter-cluster distance 

d:/i{Ai.A 2 )= min ^ 4 ( 0 .a'), (83) 

aSAi.a'sA2 

We choose single linkage because it yields stable dendrograms, as expomided 
below, under iissuniptions on the probability measure from which the data 
is sampled that ixre not very restrictive. Combined with our stability mid 
consistency results for covariance fields, this guarantees that the manifold 
clustering method is stable at all stages. 

6.1. Dendrogram Stability 

We denote a metric space by X = [X,dx)> An ultrametric; space is a pair 
(-A, n.Y)^ where ux . X x X ^ R"*" is a metric on X that satisfies the strong 
triangle inequality 

ux{x,x') < niax{iix(^i 2 :"),nY(a:",r:')} , (84) 

for all x,x\x" € X. Any such fimction ux is called an ultiumetric on X. 

As prov(;d in [32]^ dendrograms over a finite set X are in structure- 
preserving, bijective correspondence with ultrainetrics on X. In this formula¬ 
tion, a hierarchical cliLStering method can be regarded as a map : Ad —^ W 
from finite metric spaces into finite ultrametric spaces. Henceforward. 'H will 
denote thc^ map given by single linkage hierarchical c;lustering. It is known 
[32] that if X = {X, dx) ^ M, then ?i(X) = {X. ux) is given by 

ux{3:^ X*) = min maxdY(^ii 3 :^+ 1 ). (85) 

xsxo....,Xt^x' i 

The minimum above is taken over all finite ordered sequences xq,Xi. ... ^x^ 
of points in X such that xq = x and iv = x‘. If 6 X, then ux{x.,x') 
may be interpreted as the dendrogram level at which thc^ clusters containing 
X mid x' first merge. This is known as the cophenetic distance between x 
and x'. 
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Thi) main goal of this section is to formulate and prove stability of the 
ina]) 5 X •-> ?i(X) 6 U. The question of stability of single linkage 
clustering can be approaclied using ideas related to the Groniov-Hausdorff 
distance HU], as follows. A correspondence R between two sets X and V' is 
a subset of X x y such that ni{R) = X and 7r2(i?) = \\ where tti and 
denote projections onto the first mid second factors. Given X and Y, we 
denote by 7J(X, 1') the set of all correspondences betwecMi X and V'. 

Definition 6. Let X and Y be compact metric spac:es. 

(i) The distortiOTi of a correspondence R between X and Y is defined by 

dis(i?;X. Y) := max |dx(x,x) - • 

{x,y),{x\y')eR 

(ii) The Gromov-Haiisdorff disifince: between X and Y is given by 

rfo„(X.Y) :=linfclis(i?;X,Y), 

where the iiifinuini is taken over all correspondences between X and Y. 
The following stability result is a generalization of |221 Pro])osition 26]. 
Proposition 5. For any X. Y € and any correspondence R € 7^(X.y), 
dis(/?; n(X),n(Y)) < dis(/?; X, Y). 

As a consequence, daH(R(X)yR{Y)) < rfG//(X, Y). 

Remark 8. The chiini of the proposition may be written, equivalently, as 
follows. If ux and uy denote the ultrmnetrics produced by single linkage; 
hierarcliiciil chustering on X and Y, tlu^n 

\uxix,x')-UY{y,y']\< nmx \dxix,x')-dY(y,y')\. (86) 

{T,v),(r'y)€A 

for any correspondence R. between X mid Y mid all (x,y),{x\y') € R. 

Proof of Proposition^^ We prove (j^. Given a correspondence R. 6 7?-(X. y) 
and {x,y),(x\y*) € R„ let x = xo,xi,... ,Xn = x' in X be such that 
miiXidxlxi,Xi+j_) = ux{x,x'). Let = y,y„ = y' and choase j/i.y„_i e 
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Y such that (Xi.yi) ^ R for all i = 1_,n — 1. This is possible since any 

corrc'spondeiice R satisfies 7ri(i?) = X. Notice that 

nY(y,y') < maxdY{yi, yi+i) 

I 

< max{dxixi,Xi+i) + |rfx(2ir3r£+i) - dy(j/i,yi+i)|) 

* f87) 

< max£f.Y(ii, 2 i+i)+ max \dx(x,x') - dyiy.y')] '' 

= ux{x,x')+ max \dx{x,x') - dY{y.y')\. 

{x.y).{x\y')^R 

The chiini follows since ( [^ aLso holds if we reverse the roles of X and Y. □ 

Lemma 2. Let ^ and a > 0. If a kernel satisfies the conditions 

of Lemma^ then 

Slip ||E„(a.a)-S^(6.a)l| < snp ||a - 6|| ■ 

{a,h)sRts (a,6)6AM 

for any coupling fi ^ r{a^fi), where R^ := supp[i.i] and Aj > 0 is as in 
Lemma 0 

Pivof. Set C = ^'iPia.b)€snpp{fi] ll« “ Lct fi € r{a,/3) and e 

supp [/i]. In the notation of Lemma setting zi = y — a mid 22 = y' — we 
have 

\\Qa{y-a)-Qo{y'-b)\\ < {\\y- y'\\ + \\a-b\\) < (88) 

where in the last inequality we used ||y — y'|| < C, and ||a — 6|| < C- Since 
||E„(£ 1 ,o-) - S^(6 ,o-)|| < JJ \\Q„{y-a]-Q^{y'-b)\\fi{dyxdy'), (89) 
the lemma follows. □ 


Lemma 3 (Lenuna 2.2 of |3I]). Let Then, for any coupling 

fi € r(Q,^), R^ = supp[fi\ gives a correspondence between A = supp [a] and 
B = supp \fi]. 


Theorem 7. Let a.fi ^ A = i'upp[('v], B = 3upp[p], a > 0 and 

*> > 0. Then, for any keinel satisfying the conditions of Lemma^ 


daij 


+ 7MVcc(^,^). 


udth A/ > 0 as in Lemma 
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Proof. Lot /i € r(rt,/3) be a coupling that realizes By Lemma 

= snpp [fj] is a correspondence between A and B. Thus, 


daH{^aieT,-r,^0,cT.~r) < - dis(i?^; A, S) 

= I sup \da;aA^,a') - 6')| 

^ {a.b),{a',b’)eR^ 

«'nj (||S^(a,cr)-E5(6,cr)|| + 

+ |jE„((i',o-) - E3(&',a-)|| + 7 ||<I - b\\ + 7 ||ci' - 6'||) 
< sup ||Ea(a.(7) - Efl(ii.a )||+7 sup ||a - 6|| 


< 



sup ||a — 6 ||, 
(a,6)€fl^ 


(90) 


where the hist step follows from Lcunma The conclusion follows since 
W^{a, 0) = sup,„,(,)gfl^ \\a - 6 ||. □ 

Combining Proposition!^ and Theoremwc obtain: 

Corollary 8 (Stability of Hierarchical Manifold Clustering). Let a^0 ^ 
Poo(R^) be probability measures with finite support, A = supp[a]. B = 
supp[0\, a > 0 and 7 > 0. Then, 


6.2. Comments About Consistency of Hierarchical Manifold Clustering 
A cpiestion that our paper leaves open is whethcT. under a reasonable 
generative model for the sampling from a collection of inter sec; ting niani- 
folcLs one may be able to proven that thc^ empirical dendrogram converges in 
])roLability to a dendrogram that represents the spatial organization of the 
underlying manifolds. In the simpler context of clustering n i.i.d. smnples 
{xi,...,Xn} from a compiict metric space (Xydx) endowed with a Bor el 
probability measure it 1 ms been established in [ 22 ] that single linki^ge hi¬ 
erarchical clustering converges to a dendrogram whose hierarchical structure 
depends on the support of ax in a j^reclsc way. 
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In the case of flat clustering, i.e. when the goal is to obtain a single par¬ 
tition of the dataset, consistency results for some multi-nianifold clustering 
methods based on local PC A are given in [El fTTl ITS] . 

6.3. Examples and Applications 

Let X = {xj ./•„} be a dataset in R^. For 7 . a > 0, we apply the 

single linkiXge method to the metric space = (AT, dan; 7 ,c -)5 where is 

the; empiriciil me^isure dissociated witli X and Is the distance defined 

in d^ . The ultrametric associated with is abbreviated 

Ill this setting, analyzing informative ckMidrograin cutoff levels often is 
an important task, wiiich can be a])proac:hed in different ways, depending on 
the nature of the problem. For example, a cutoff level h may be based on a 
pre-assigned number of clusters, be learned from trmning data, or be more 
exploratory. We give exmnples that illustrate all tliree viewpoints. 

Example 9 (Lines and planes). In this experiment we consider the unlabeled 
point cloud in Fig.[^a) that represents an arrduigement of two pardillel phmes 
and two lines that intersect the planes trcuisversely. Each ])lane contdiins 225 
points on a uniform grid and eaeli line contains 30 equally spaced points. 
Cutting the dendrogram at four clusters, our method finds the four affine 
linear subspac:es ac;curately with the Gaussian kernel at cr = 0.6. In this 
c;ase, it is important to clioose 7 7 ^ 0 since the spatial component of is 
needed to discriminate the paridlel planes. In Fig.j^a), the points are c olored 
according to cluster membership. In this case, tlie covmiance tensors at data 
points on the planes that are away ’from the cluster intersections have two 
dominating eigenvalues, whereas for ])oints on the lines they have only one 
such eigenvalue. Thus, an analysis of the spec trmn of the covariance tensors 
at data points let us infer thc^ dimension of eac:h cluster. 

Example 10 (A Line Arrangement). In this example, the point-cloud data 
represents three intersecting lines in R^, as shown in Fig.[^a). Each line 
segment is sampled at 200 equally spaced points. Since the slopes of the 
lines are different, we expect the covariance matrices to be able to cluster the 
points without the md of additional spatial information. Thus, we set 7 = 0 
in d^ mid cr = 0.4. The niunber of clusters was set to six to test the ability 
of the algorithm to detect not only the lines, but also the three intersections. 
Fig.[^b) shows the single-linkage dendrogram, highlighting each of the six 
clusters. Tht? data points are colored according to cluster membership. As 
expected, well delinc^ated clusters are detected away from the intersection 


38 




Figure 10: (a) an arrangement of three lines and (b) clustering dendrogram; (c) noisy lines 
with outliers and (d) clustering dendrogram. 


points because the covariance matrices axe highly anisotropic witli principal 
axes tliat align well -^-itli the corresponding line segments. Altliough the 
c;ovarianc;e matrices are nut as anisotropic near thc^ intersection points, tliere 
are enough differences in their behavior near the tliree intersection loci for 
the; algorithm to be able to place them into different clusters. 

The next two examples are of a more exploratory nature in tliat dendro- 
grmn cutoff was chosen through experimentation with tiie data. 

Example 11 (Noisy Lines with Outliers). This is a noisy version of Example 


l)ut we have added GaiLssimi noise of width 0.015 to the data, as well iis 180 
outliers sampled from tiie uniform distribution on a rectangle containing the 
lines. Because of the nature of the data, the number of chisters was set to 
m = 80 so that the three main clnstcTs did not get merged because of the 
outliers. The figme also shows a line fitted to each of the three largc'st chisters 
using principal component analysis. The method was able to sharply recover 
the tliree lines, even in the presence of noise and outliers. The majority of 
the; 80 clusters are singletons of outliers and tliese are colored black in the 
figure. We remark that the choice of a = 0.51 is crucial when dealing with 
data contaminated by noise. In this case, it was also important to set 7^0 
to better cope with noise. 

Example 12 {Floor cracks). We apply the cliLStering method to segmentation 
of two images of concrete floor or ticks. Panel (a) of Fig.[TT] shows the orig¬ 
inal images, whereas ])anel (b) shows binary images obtaiu(‘d from an edge 
detec;tion algorithm. We cluster the foreground pixels of the binary images. 
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as shown in Fig.Tmc). As before, each line is represented by 200 points, 
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As in Example it is important to allow a fairly large number of cliL^ters 
so that the clusters that detec;t tlie main cracks do not get merged because 
of the noisy pixels. Panels (c) and (d) show the outputs (not to scale) of the 
clustering algorithm. 



Figure 11; (a) original and (b) processed images of floor cracks; (c) and (d) show chisteiing 
based on CTFs. 


To further test the ability of the method to clu^ster intersecting manifolds, 
we experimented with synthetic data comprising multiple arrmigeinents such 
as the intersecting lines in Fig.[^b). 


Example 13. We consider three syntehtic datasets of point clouds represent¬ 
ing random iirraiigements of: (i) tliree line segments in (ii) four curves in 
tliat are either line segments or ares of parabolas; and (iii) three patc:lies 


of phmes in R*^. Eacli of these datasets contains a total of 250 point clouds, 
50 used for training the algoritlun and 200 test samples. Tin* points in each 
point cloud are labeled to allow cinantific;ation of tlie accuracy of the output 
of the algorithm. Fig.shows a few samples from each of these datasets. 
Piirameter viilues that optimize chussification performance are learned from 
the training samples. Note, however, that even though the mmiber k of 
clusters is known, specifying a height h that yields precisely k clusters may 
yield miclesirable results. For example’, important clusters representing dif¬ 
ferent components of an arrangement of manifolds may get merged due to 
the presence of outliers or the behavior near the intersections. Thus, it is 
often ])rcferable to choose a lowc^r cutoff level before^ this phenomenon occurs 
at the expense of getting a larger number £ of clusters. In such situations, 
we select the largest k clusters and assign each point in the remaining {£ — k) 
clusters to the closest of tlie tup fc c:histers. Experiments indicate that a good 
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Figure 12: Randoin arrangements of line segments (row 1), segments of lines and parabolas 
(row 2), and plane patches (row 3). 


biLSoline for the cutoff level h is the mean copheiietic distance, which for a 
point cloud {xi,..., Xn} C is given by 

^ ' i<j 

III the learning phase, we typically searcli for h in a neighborhood of ho whose 
width is detc^rmined by the variance of the distribution of the cophenetic 
distances. 

With the learned parameter values, the algorithm perfoniLS well in all 
three cases. For each point elond we count the number of misclassified points 
and calculate the average error (AE) and the; mean error (ME) rates over all 
test samples obtaining: 

(i) arrangements of lines: 9.59% (AE) and 4.17% (ME); 

(ii) lines and parabolas: 9.93% (AE) and 3.38% (ME); 

(hi) arrangements of planes: 7.00% (AE) and 2.42% (ME). 

As expected, a closer inspection of the results reveals that most of the errors 
occur at points near the intersections of the clusters, where the covariance 
tensors are not as informative for clustering purposes. 


(91) 
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7. Concluding Remarks 

We introduced the notion of mnltiscale covariance tensor fields associ¬ 
ated with Euclidean random variables and develoj^ed a framework for the 
systematic study of the shape of data using localized covariance teiLsors. We 
investigated foundatioiuU questions sueli as stability and consistency of niul- 
tisc^ile CTFs, provided illustrations of how CTFs let us uncover geometry 
underlying data, and a])plied the methods to manifold clustering. We also 
intro due; ed null tisc ale Frechet fmic'tions, which are scalar fields derived from 
CTFs that fully capture the distributions of random vectors. Multisc;ale 
FVeclict functions are pixrticularly well suited for extension of the methods 
of this paper to non-Euclidemi random variables, a problem that is receiving 
ever increasing attention in data science. In this setting, the goal is to devise 
inctliuds that can cope with random viiriablcs taking values in spaces such as 
Riemannian manifolds and more general metric spaces. Unless restrictive as¬ 
sumptions are imposed on the sample space and the distributions, CTFs may 
be difficult to define in tliis nonlinear realm. In contrast, the Frechet ftmetion 
formulation can be easily extended to metric spaces supporting a diffusion 
kernel [32] • In forthcoming work, we will investigate theoreticiil mid compu- 
tatioiuU aspects of such extensions, including the accessibility of information 
residing in multi scale Frechet functions, a problem that ])Oses c;omputatiunal 
challenges even in the case of high-dimensional Euclidean random variables. 

In this paper, we only considered radial basis kernels; however, many 
results extend easily to more general kernels. We emphasized the multi- 
scale formulation largely because of the questions that motivated this work. 
Nevertheless, the majority of the results apply to kernels that are not sciilc 
dependent. 

Covarianc:e tensor fields also suggest ways of formalizing the notion of 
shape of Euclidean data and probability measures. For example, for a distri¬ 
bution Q with the pro])erty that the covariance tensor field Sq(-, cr) associated 
with a smooth kernel (such as the Gaussian kernel) is non-singular for ev¬ 
ery X € S“^(',cr) defines a metric tensor with close ties to a. This poses 

the problcMU of uncovering relationships between Riemannian metrics derived 
from CTFs, such as and the shape of a. 

In a different direction, for a fixed point 2 : € R^, an interesting problem 
is that of capturing the values of a for which a) exhibits a ‘*jump" 
in behavior. This study, in the context of images, gives rise to notions of 
local scales. Knowledge of lociU sciiles for each point x leads to criteria 
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fur selecting important, salient points in the spirit of SIFT [331 O- The 
c;once])t of local scales arose first in the context of images [33] and was later 
extended to probabilty distributions [ 35 ]. The notions of local scales in [331 
[35] were isotropic. Thius. future developments related to climacterizing shape 
using CTFs are suggested by the possibility of constructing notions of local 
scales on general shapes [351 12] whicli —by exploiting the tensorial nature of 
SQ(a:, a)— become sensitive to direction. 

Data Accessibility 

The syntlietic data used in the manifold clustering ex])eriments is avail¬ 
able at https://bitbucket.org/diegodiaz-math/ctf-files/. 
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